6 research outputs found
VScript: Controllable Script Generation with Visual Presentation
In order to offer a customized script tool and inspire professional
scriptwriters, we present VScript. It is a controllable pipeline that generates
complete scripts, including dialogues and scene descriptions, as well as
presents visually using video retrieval. With an interactive interface, our
system allows users to select genres and input starting words that control the
theme and development of the generated script. We adopt a hierarchical
structure, which first generates the plot, then the script and its visual
presentation. A novel approach is also introduced to plot-guided dialogue
generation by treating it as an inverse dialogue summarization. The experiment
results show that our approach outperforms the baselines on both automatic and
human evaluations, especially in genre control
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural
language processing (NLP). Yet, what `good generalisation' entails and how it
should be evaluated is not well understood, nor are there any common standards
to evaluate it. In this paper, we aim to lay the ground-work to improve both of
these issues. We present a taxonomy for characterising and understanding
generalisation research in NLP, we use that taxonomy to present a comprehensive
map of published generalisation studies, and we make recommendations for which
areas might deserve attention in the future. Our taxonomy is based on an
extensive literature review of generalisation research, and contains five axes
along which studies can differ: their main motivation, the type of
generalisation they aim to solve, the type of data shift they consider, the
source by which this data shift is obtained, and the locus of the shift within
the modelling pipeline. We use our taxonomy to classify over 400 previous
papers that test generalisation, for a total of more than 600 individual
experiments. Considering the results of this review, we present an in-depth
analysis of the current state of generalisation research in NLP, and make
recommendations for the future. Along with this paper, we release a webpage
where the results of our review can be dynamically explored, and which we
intend to up-date as new NLP generalisation studies are published. With this
work, we aim to make steps towards making state-of-the-art generalisation
testing the new status quo in NLP.Comment: 35 pages of content + 53 pages of reference
A taxonomy and review of generalization research in NLP
Funding Information: We thank A. Williams, A. Joulin, E. Bruni, L. Weber, R. Kirk and S. Riedel for providing feedback on the various stages of this paper, and G. Marcus for providing detailed feedback on the final draft. We also thank the reviewers of our work for providing useful comments. We thank E. Hupkes for making the app that allows searching through references, and we thank D. Haziza and E. Takmaz for other contributions to the website. M.G. was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 819455). V.D. was supported by the UKRI Centre for Doctoral Training in Natural Language Processing, funded by the UKRI (grant no. EP/S022481/1) and the University of Edinburgh. N.S. was supported by the Hyundai Motor Company (under the project Uncertainty in Neural Sequence Modeling) and the Samsung Advanced Institute of Technology (under the project Next Generation Deep Learning: From Pattern Recognition to AI). Publisher Copyright: © 2023, The Author(s).Peer reviewedPublisher PD
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Code-switching is a speech phenomenon occurring when a speaker switches
language during a conversation. Despite the spontaneous nature of
code-switching in conversational spoken language, most existing works collect
code-switching data from read speech instead of spontaneous speech. ASCEND (A
Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English
code-switching corpus built on spontaneous multi-turn conversational dialogue
sources collected in Hong Kong. We report ASCEND's design and procedure for
collecting the speech data, including annotations. ASCEND consists of 10.62
hours of clean speech, collected from 23 bilingual speakers of Chinese and
English. Furthermore, we conduct baseline experiments using pre-trained wav2vec
2.0 models, achieving a best performance of 22.69\% character error rate and
27.05% mixed error rate
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what `good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the ground-work to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to up-date as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP
Recommended from our members
A taxonomy and review of generalization research in NLP
Funder: N.S. was supported by the Hyundai Motor Company (under the project Uncertainty in Neural Sequence Modeling) and the Samsung Advanced Institute of Technology (under the project Next Generation Deep Learning: From Pattern Recognition to AI).AbstractThe ability to generalize well is one of the primary desiderata for models of natural language processing (NLP), but what ‘good generalization’ entails and how it should be evaluated is not well understood. In this Analysis we present a taxonomy for characterizing and understanding generalization research in NLP. The proposed taxonomy is based on an extensive literature review and contains five axes along which generalization studies can differ: their main motivation, the type of generalization they aim to solve, the type of data shift they consider, the source by which this data shift originated, and the locus of the shift within the NLP modelling pipeline. We use our taxonomy to classify over 700 experiments, and we use the results to present an in-depth analysis that maps out the current state of generalization research in NLP and make recommendations for which areas deserve attention in the future.</jats:p